97 research outputs found

    Modeling temporal dimensions of semistructured data

    Get PDF
    In this paper we propose an approach to manage in a correct way valid time semantics for semistructured temporal clinical information. In particular, we use a graph-based data model to represent radiological clinical data, focusing on the patient model of the well known DICOM standard, and define the set of (graphical) constraints needed to guarantee that the history of the given application domain is consistent

    Tracking Data Provenance of Archaeological Temporal Information in Presence of Uncertainty

    Get PDF
    The interpretation process is one of the main tasks performed by archaeologists who, starting from ground data about evidences and findings, incrementally derive knowledge about ancient objects or events. Very often more than one archaeologist contributes in different time instants to discover details about the same finding and thus, it is important to keep track of history and provenance of the overall knowledge discovery process. To this aim, we propose a model and a set of derivation rules for tracking and refining data provenance during the archaeological interpretation process. In particular, among all the possible interpretation activities, we concentrate on the one concerning the dating that archaeologists perform to assign one or more time intervals to a finding to define its lifespan on the temporal axis. In this context, we propose a framework to represent and derive updated provenance data about temporal information after the mentioned derivation process. Archaeological data, and in particular their temporal dimension, are typically vague, since many different interpretations can coexist, thus, we will use Fuzzy Logic to assign a degree of confidence to values and Fuzzy Temporal Constraint Networks to model relationships between dating of different findings represented as a graph-based dataset. The derivation rules used to infer more precise temporal intervals are enriched to manage also provenance information and their following updates after a derivation step. A MapReduce version of the path consistency algorithm is also proposed to improve the efficiency of the refining process on big graph-based datasets

    Operational and abstract semantics of the query language G-Log

    Get PDF
    The amount and variety of data available electronically have dramatically increased in the led decade; however, data and documents are stored in different ways and do notusual# show their internal structure. In order to take ful advantage of thetopolk9dQ# structure ofdigital documents, andparticulIII web sites, theirhierarchical organizationshouliz explizatio introducing a notion of querysimil; to the one usedin database systems. A good approach, in that respect, is the one provided bygraphical querylrydM#99; original; designed to model object bases and lndd proposed for semistructured data, la, G-Log. The aim of this paper is to providesuitabl graph-basedsemantics to thislisd;BI# supporting both data structure variabil#I andtopol#Ik;M similpol#I between queries and document structures. A suite ofoperational semantics basedon the notion ofbisimulQM#I is introduced both at theconcr--h level (instances) andat theabstru( level (schemata), giving rise to a semantic framework that benefits from the cross-fertil9;dl of tool originalM designed in quite different research areas (databases, concurrency,loncur static analysis)

    Semi-automatic support for evolving functional dependencies

    Get PDF
    During the life of a database, systematic and frequent violations of a given constraint may suggest that the represented reality is changing and thus the constraint should evolve with it. In this paper we propose a method and a tool to (i) find the functional dependencies that are violated by the current data, and (ii) support their evolution when it is necessary to update them. The method relies on the use of confidence, as a measure that is associated with each dependency and allows us to understand \u201dhow far\u201d the dependency is from correctly describing the current data; and of goodness, as a measure of balance between the data satisfying the antecedent of the dependency and those satisfying its consequent. Our method compares favorably with literature that approaches the same problem in a different way, and performs effectively and efficiently as shown by our tests on both real and synthetic databases

    A graph-based meta-model for heterogeneous data management

    Get PDF
    The wave of interest in data-centric applications has spawned a high variety of data models, making it extremely difficult to evaluate, integrate or access them in a uniform way. Moreover, many recent models are too specific to allow immediate comparison with the others and do not easily support incremental model design. In this paper, we introduce GSMM, a meta-model based on the use of a generic graph that can be instantiated to a concrete data model by simply providing values for a restricted set of parameters and some high-level constraints, themselves represented as graphs. In GSMM, the concept of data schema is replaced by that of constraint, which allows the designer to impose structural restrictions on data in a very flexible way. GSMM includes GSL, a graph-based language for expressing queries and constraints that besides being applicable to data represented in GSMM, in principle, can be specialised and used for existing models where no language was defined. We show some sample applications of GSMM for deriving and comparing classical data models like the relational model, plain XML data, XML Schema, and time-varying semistructured data. We also show how GSMM can represent more recent modelling proposals: the triple stores, the BigTable model and Neo4j, a graph-based model for NoSQL data. A prototype showing the potential of the approach is also described

    CoPart: a context-based partitioning technique for big data

    Get PDF
    The MapReduce programming paradigm is frequently used in order to process and analyse a huge amount of data. This paradigm relies on the ability to apply the same operation in parallel on independent chunks of data. The consequence is that the overall performances greatly depend on the way data are partitioned among the various computation nodes. The default partitioning technique, provided by systems like Hadoop or Spark, basically performs a random subdivision of the input records, without considering the nature and correlation between them. Even if such approach can be appropriate in the simplest case where all the input records have to be always analyzed, it becomes a limit for sophisticated analyses, in which correlations between records can be exploited to preliminarily prune unnecessary computations. In this paper we design a context-based multi-dimensional partitioning technique, called COPART, which takes care of data correlation in order to determine how records are subdivided between splits (i.e., units of work assigned to a computation node). More specifically, it considers not only the correlation of data w.r.t. contextual attributes, but also the distribution of each contextual dimension in the dataset. We experimentally compare our approach with existing ones, considering both quality criteria and the query execution times

    Tracking social provenance in chains of retweets

    Get PDF
    In the era of massive sharing of information, the term social provenance is used to denote the ownership, source or origin of a piece of information which has been propagated through social media. Tracking the provenance of information is becoming increasingly important as social platforms acquire more relevance as source of news. In this scenario, Twitter is considered one of the most important social networks for information sharing and dissemination which can be accelerated through the use of retweets and quotes. However, the Twitter API does not provide a complete tracking of the retweet chains, since only the connection between a retweet and the original post is stored, while all the intermediate connections are lost. This can limit the ability to track the diffusion of information as well as the estimation of the importance of specific users, who can rapidly become influencers, in the news dissemination. This paper proposes an innovative approach for rebuilding the possible chains of retweets and also providing an estimation of the contributions given by each user in the information spread. For this purpose, we define the concept of Provenance Constraint Network and a modified version of the Path Consistency Algorithm. An application of the proposed technique to a real-world dataset is presented at the end of the paper

    A Context-Aware Recommendation System with a Crowding Forecaster

    Get PDF
    Recommendation systems (RSs) are increasing their popularity in recent years. Many big IT companies like Google, Amazon and Netflix, have a RS at the core of their business. In this paper, we propose a modular platform for enhancing a RS for the tourism domain with a crowding forecaster, which is able to produce an estimation about the current and future occupation of different Points of Interest (PoIs) by taking into consideration also contextual aspects. The main advantage of the proposed system is its modularity and the ability to be easily tailored to different application domains. Moreover, the use of standard and pluggable components allows the system to be integrated in different application scenarios

    Database challenges for exploratory computing

    Get PDF
    Helping users to make sense of very big datasets is nowadays considered an important research topic. However, the tools that are available for data analysis purposes typically address professional data scientists, who, besides a deep knowledge of the domain of interest, master one or more of the following disciplines: mathematics, statistics, computer science, computer engineering, and programming. On the contrary, in our vision it is vital to support also different kinds of users who, for various reasons, may want to analyze the data and obtain new insight from them. Examples of these data enthusiasts [4, 9] are journalists, investors, or politicians: non-technical users who can draw great advantage from exploring the data, achieving new and essential knowledge, instead of reading query results with tons of records. The term data exploration generally refers to a data user being able to find her way through large amounts of data in order to gather the necessary information. A more technical definition comes from the field of statistics, introduced by Tukey [12]: with exploratory data analysis the researcher explores the data in many possible ways, including the use of graphical tools like boxplots or histograms, gaining knowledge from the way data are displayed. Despite the emphasis on visualization, exploratory data analysis still assumes that the user understands at least the basics of statistics, while in this paper we propose a paradigm for database exploration which is in turn inspired by the exploratory computing vision [2]. We may describe exploratory computing as the step-by-step “conversation” of a user and a system that “help each other” to refine the data exploration process, ultimately gathering new knowledge that concretely fullfils the user needs. The process is seen as a conversation since the system provides active support: it not only answers user’s requests, but also suggests one or more possible actions that may help the user to focus the exploratory session. This activity may entail the use of a wide range of different techniques, including the use of statistics and data analysis, query suggestion, advanced visualization tools, etc. The closest analogy [2] is that of a human-tohuman dialogue, in which two people talk, and continuously make reference to their lives, priorities, knowledge and beliefs, leveraging them in order to provide the best possible contribution to the dialogue. In essence, through the conversation they are exploring themselves as well as the information that is conveyed through their words. This exploration process therefore means investigation, exploration-seeking, comparison-making, and learning altogether. It is most appropriate for big collections of semantically rich data, which typically hide precious knowledge behind their complexity. In this broad and innovative context, this paper intends to make a significant step further: it proposes a model to concretely perform this kind of exploration over a database. The model is general enough to encompass most data models and query languages that have been proposed for data management in the last few years. At the same time, it is precise enough to provide a first formalization of the problem and reason about the research challenges posed to database researchers by this new paradigm of interaction
    • …
    corecore